Heavy-Tailed Analogues of the Covariance Matrix for ICA
نویسندگان
چکیده
Independent Component Analysis (ICA) is the problem of learning a square matrix A, given samples of X = AS, where S is a random vector with independent coordinates. Most existing algorithms are provably efficient only when each Si has finite and moderately valued fourth moment. However, there are practical applications where this assumption need not be true, such as speech and finance. Algorithms have been proposed for heavy-tailed ICA, but they are not practical, using random walks and the full power of the ellipsoid algorithm multiple times. The main contributions of this paper are: (1) A practical algorithm for heavy-tailed ICA that we call HTICA. We provide theoretical guarantees and show that it outperforms other algorithms in some heavy-tailed regimes, both on real and synthetic data. Like the current state-of-the-art, the new algorithm is based on the centroid body (a first moment analogue of the covariance matrix). Unlike the state-of-the-art, our algorithm is practically efficient. To achieve this, we use explicit analytic representations of the centroid body, which bypasses the use of the ellipsoid method and random walks. (2) We study how heavy tails affect different ICA algorithms, including HTICA. Somewhat surprisingly, we show that some algorithms that use the covariance matrix or higher moments can successfully solve a range of ICA instances with infinite second moment. We study this theoretically and experimentally, with both synthetic and real-world heavy-tailed data.
منابع مشابه
Estimation of the covariance structure of heavy-tailed distributions
We propose and analyze a new estimator of the covariance matrix that admits strong theoretical guarantees under weak assumptions on the underlying distribution, such as existence of moments of only low order. While estimation of covariance matrices corresponding to sub-Gaussian distributions is well-understood, much less in known in the case of heavy-tailed data. As K. Balasubramanian and M. Yu...
متن کاملThe Eigenvalues of the Sample Covariance Matrix of a Multivariate Heavy-tailed Stochastic Volatility Model
We consider a multivariate heavy-tailed stochastic volatility model and analyze the large-sample behavior of its sample covariance matrix. We study the limiting behavior of its entries in the infinite-variance case and derive results for the ordered eigenvalues and corresponding eigenvectors. Essentially, we consider two different cases where the tail behavior either stems from the iid innovati...
متن کاملNew HEAVY Models for Fat-Tailed Returns and Realized Covariance Kernels
We develop a new model for the multivariate covariance matrix dynamics based on daily return observations and daily realized covariance matrix kernels based on intraday data. Both types of data may be fat-tailed. We account for this by assuming a matrix-F distribution for the realized kernels, and a multivariate Student’s t distribution for the returns. Using generalized autoregressive score dy...
متن کاملNovel Characteristic Function Based Criteria for Ica
We introduce two nonparametric independent component analysis (ICA) criteria based on factorization of characteristic functions. This approach has potential to separate wide class of distributions because characteristic function always exists. A simple criterion allowing for efficient search of the separating matrix and a more advanced criterion possessing desirable consistency property are pre...
متن کاملSpectral Measure of Heavy Tailed Band and Covariance Random Matrices
We study the asymptotic behavior of the appropriately scaled and possibly perturbed spectral measure μ̂ of large random real symmetric matrices with heavy tailed entries. Specifically, consider the N ×N symmetric matrix YσN whose (i, j) entry is σ( i N , j N )xij where (xij , 1 ≤ i ≤ j < ∞) is an infinite array of i.i.d real variables with common distribution in the domain of attraction of an α-...
متن کامل